ftp.cs.arizona.edu

home *** CD-ROM | disk | FTP | other *** search

/ ftp.cs.arizona.edu / ftp.cs.arizona.edu.tar / ftp.cs.arizona.edu / tsql / doc / tsql.mail / 000080_csj@iesd.auc.dk _Tue Apr 13 19:35:15 1993.msg < prev next >

Wrap

Internet Message Format | 1996-01-31 | 8KB

Received: from iesd.auc.dk by optima.cs.arizona.edu (5.65c/15) via SMTP id AA11219; Tue, 13 Apr 1993 10:36:09 MST Received: from yellow.iesd.auc.dk by iesd.auc.dk with SMTP id AA01169 (5.65c8/IDA-1.5/MD for <tsql@cs.arizona.edu>); Tue, 13 Apr 1993 19:35:15 +0200 Date: Tue, 13 Apr 1993 19:35:15 +0200 From: "Christian S. Jensen" <csj@iesd.auc.dk> Message-Id: <199304131735.AA01169@iesd.auc.dk> To: tsql@cs.arizona.edu Subject: Benchmark: Groupedness ******************************************************************** * The TSQL Benchmark Initiative -- Task 2: Database Instance * ******************************************************************** Below, I discuss the recent insights on the topic of groupedness brought to the benchmark discussion by Jim. Jim points to an inconsistency in the current draft of the benchmark. I propose four ways of eliminating this inconsistency. We should discuss these alternatives and make a decision. Best regards, Christian Jim presents and discusses four points, that we can agree or disagree upon. > From info-tsql-sender@cs.arizona.edu Sat Apr 10 00:30:31 1993 > From: Jim Clifford <jcliffor@is-4.stern.nyu.edu> The first point is summarized as follows. > the database instance should accord with ALL AND ONLY those > constraints which are explicitly stated. I believe that this is a good very ideal for a consensus effort, and I will adopt this principle in the next draft if there are no objections. The third point is the observation that no constraints on Name in the Emp relation, except that Name is a snapshot key, are stated explicitly. This is indisputable. The second point is the claim that proposed database instance violates the AND ONLY part of the first point in at least one way. The way the instance is currently described, this is a true claim in my opinion. The fourth point has two parts. The first (4a) is that the proposed database also assumes that attribute Name is time-invariant. This is a violation of the AND ONLY portion of the first point. As I indicated in my most recent message on this topic, for Name to be time-varying, it must be time-varying wrt. something. That something is a person in the modeled reality. This time-invariance cannot be captured by functional dependencies unless we include an attribute with values that represent real-world persons (i.e., a surrogate attribute, not property attributes such as Name or Social_security_number). Thus, to eliminate the inconsistency in the benchmark document, we can do at least four things: I. We can add to the schema that a person cannot change name. I certainly would have mentioned this assumption in the straw proposal, had I thought of it. This way, it would have been an explicit candidate for discussion. Thanks to Jim for identifying the inconsistency. It can be argued that this assumption is not worse than the assumption that two persons cannot have the same name at the same time (the key assumption). Such assumptions may be made to obtain a simple schema. In a real application, we may want employee id's or social security numbers as attribute values. II. We can remove all references to real-world entities and only talk about attribute values when we describe the database instance. That way, Ed and Di could be the same person, and the inconsistency is removed. III. We can retain the reference to real persons and add a name change for "*ED*" as proposed by Jim. IV. Ordinary keys really cannot represent real-world entities properly. Surrogates are commonly used for this, so if we want to represent real-world persons in our design, we should perhaps use surrogates. This line of reasoning leads to a natural next step: We should allow two real world persons to have identical (at all times) Names, Salary, Dept, Gender, and D-birth values. This may lead to duplicate tuples (which are disallowed) if object identity is not supported fully. With this extension, our design is realistic: A person may change name, and two persons may have the same name (and other attributes). The second part of the fourth point (4b) is that the exclusion of a key with values that both represent real-world objects and properties which change over time biases the proposed benchmark in favor of the tuple-stamped models. For this I have several remarks and some questions. 1. It is argued that the current db instance favors ungroped models (because it avoids data that are hard to cope with in such models). "Ungrouped" and "tuple-stamped" models appear to be used as synonyms. Has it been proven, e.g., that a tuple-stamped model must necessarily be ungrouped? I think a grouped, tuple-stamped model can be designed. Is it true that all attribute value stamped models are grouped? In Shashi's model, assume that a relation has a Name and a Dept attribute where both Name and Dept are keys (but persons and departments may change names). Is it possible in this model to retrieve for each department the number of persons ever to work in the department? Note that the query asks for real-world persons and departments. (I chose Shashi's model because I like it and it is widely known.) Another query: When was the person currently know as Ed in the department currently known as the Toy department? If these queries cannot be expressed, is the attribute value stamped model then grouped? (Jim writes: For a query abut *ED* to return COMPLETE information about him, in Temporally Grouped Models IT IS ONLY NECESSARY for the end user to know the name of *ED* at some point in time (perhaps, e.g., NOW). The MODEL is responsible for the rest, because the model manages the temporal dimension of the data about *ED*, and places (I believe) only minimal and reasonable demands on the end user.) 2. Jim argues very convincingly that the current db instance does not help show how grouped models are superior to ungrouped in some respects. (...just as is shown in a clear way how groupedness is a benefit.) However, the current db instance also does not help show how, e.g., models that can represent future valid times are superior in some respects to models that can only represent valid time not exceeding the current time. (Thus, we currently favor models that cannot represent times exceeding the present time.) For example, in the latter models one cannot store a fact such as "Ed will be employeed in the toy department for the next year." So far this kind of data has been avoided to get an initial instance that all models can be happy with. Among the other omissions are continuous attributes such as temperature (with the need for multiple interpolation functions to be used on sampled values). Thus, we currently also favor models that cannot represent sampled data that need multiple interpolation functions. The idea has been to try to not put any models in a bad light in the first benchmark. These examples show that in the first version of the benchmark, several aspects, not just the groupedness aspect, that may put some models in a bad light are avoided. Of cause, there may still be good reasons to want to shed light on the groupedness aspect in the first version. For example, it can be argued that groupedness is a subtle but important aspect of user-friendliness that needs to be brought to our attention as fast as possible.